class: center, middle, inverse, title-slide # Procesando datos con el paquete tidyverse - Joins ## Ciencias Sociales Computacionales y Humanidades Digitales --- <style type="text/css"> .remark-slide-content { font-size: 25px; padding: 1em 1em 1em 1em; } <style type="text/css"> .remark-code{ line-height: 1.5; font-size: 80% } @media print { .has-continuation { display: block; } } </style> </style> --- class: inverse, middle, center # joins <html> <div style='float:left'></div> <hr color='#EB811B' size=1px width=1125px> </html> --- <img src="data:image/png;base64,#C:/Users/pablo/Documents/Pablo/Cursos R/R+UNSAM/M1_Diplo_CSS/Clase 3 - Tidyverse/img/joins.png" width="35%" style="display: block; margin: auto;" /> --- class: inverse, middle, center # __left_join()__ <html> <div style='float:left'></div> <hr color='#EB811B' size=1px width=1125px> </html> --- # left_join() <img src="data:image/png;base64,#C:/Users/pablo/Documents/Pablo/Cursos R/R+UNSAM/M1_Diplo_CSS/Clase 3 - Tidyverse/img/left-join.gif" width="50%" style="display: block; margin: auto;" /> --- # Ejemplo de uso - **Caso:** Quiero conocer el peso de los casos confirmados y fallecidos de cada provincia en relación a su respectiva población (proyección a 2021). ```r library(data.table) base_covid_total <- fread("Fuentes/Covid19Casos.csv", header = T) ``` --- # Armo tabla con cantidad de confirmados y fallecidos ```r tabla_casos <- base_covid_total %>% rename(provincia = residencia_provincia_nombre) %>% group_by(provincia) %>% summarise(confirmados = sum(clasificacion_resumen == "Confirmado"), fallecidos = sum(fallecido == "SI")) ``` ```r head(tabla_casos) ``` ``` # A tibble: 6 x 3 provincia confirmados fallecidos <chr> <int> <int> 1 Buenos Aires 1590330 47763 2 CABA 411845 11385 3 Catamarca 22916 173 4 Chaco 57364 1564 5 Chubut 62173 1106 6 Córdoba 304820 4568 ``` --- # Importo [tabla con proyecciones](https://www.indec.gob.ar/indec/web/Nivel4-Tema-2-24-85) ```r pob_est_2021 <- fread("Fuentes/estimacion_pob_2021_provincias.csv", header = T, encoding = "UTF-8") ``` ```r head(pob_est_2021) ``` ``` Provincia pob_estimada_2021 1: CABA 3078836 2: Buenos Aires 17709598 3: Catamarca 418991 4: Córdoba 3798261 5: Corrientes 1130320 6: Chaco 1216247 ``` --- count: false # left_join() .panel1-left_join-auto[ ```r *tabla_casos ``` ] .panel2-left_join-auto[ ``` # A tibble: 25 x 3 provincia confirmados fallecidos <chr> <int> <int> 1 Buenos Aires 1590330 47763 2 CABA 411845 11385 3 Catamarca 22916 173 4 Chaco 57364 1564 5 Chubut 62173 1106 6 Córdoba 304820 4568 7 Corrientes 50475 714 8 Entre RÃos 84291 1719 9 Formosa 28858 462 10 Jujuy 28054 1122 # ... with 15 more rows ``` ] --- count: false # left_join() .panel1-left_join-auto[ ```r tabla_casos %>% * left_join(pob_est_2021, * by = c("provincia" = "Provincia")) ``` ] .panel2-left_join-auto[ ``` # A tibble: 25 x 4 provincia confirmados fallecidos pob_estimada_2021 <chr> <int> <int> <int> 1 Buenos Aires 1590330 47763 17709598 2 CABA 411845 11385 3078836 3 Catamarca 22916 173 418991 4 Chaco 57364 1564 1216247 5 Chubut 62173 1106 629181 6 Córdoba 304820 4568 3798261 7 Corrientes 50475 714 1130320 8 Entre RÃos 84291 1719 1398510 9 Formosa 28858 462 610019 10 Jujuy 28054 1122 779212 # ... with 15 more rows ``` ] --- count: false # left_join() .panel1-left_join-auto[ ```r tabla_casos %>% left_join(pob_est_2021, by = c("provincia" = "Provincia")) %>% * mutate(tasa_conf = round(confirmados / pob_estimada_2021 * 100, 2), * tasa_fall = round(fallecidos / pob_estimada_2021 * 100, 2)) ``` ] .panel2-left_join-auto[ ``` # A tibble: 25 x 6 provincia confirmados fallecidos pob_estimada_2021 tasa_conf tasa_fall <chr> <int> <int> <int> <dbl> <dbl> 1 Buenos Aires 1590330 47763 17709598 8.98 0.27 2 CABA 411845 11385 3078836 13.4 0.37 3 Catamarca 22916 173 418991 5.47 0.04 4 Chaco 57364 1564 1216247 4.72 0.13 5 Chubut 62173 1106 629181 9.88 0.18 6 Córdoba 304820 4568 3798261 8.03 0.12 7 Corrientes 50475 714 1130320 4.47 0.06 8 Entre RÃos 84291 1719 1398510 6.03 0.12 9 Formosa 28858 462 610019 4.73 0.08 10 Jujuy 28054 1122 779212 3.6 0.14 # ... with 15 more rows ``` ] --- count: false # left_join() .panel1-left_join-auto[ ```r tabla_casos %>% left_join(pob_est_2021, by = c("provincia" = "Provincia")) %>% mutate(tasa_conf = round(confirmados / pob_estimada_2021 * 100, 2), tasa_fall = round(fallecidos / pob_estimada_2021 * 100, 2)) %>% * select(1, starts_with("tasa")) ``` ] .panel2-left_join-auto[ ``` # A tibble: 25 x 3 provincia tasa_conf tasa_fall <chr> <dbl> <dbl> 1 Buenos Aires 8.98 0.27 2 CABA 13.4 0.37 3 Catamarca 5.47 0.04 4 Chaco 4.72 0.13 5 Chubut 9.88 0.18 6 Córdoba 8.03 0.12 7 Corrientes 4.47 0.06 8 Entre RÃos 6.03 0.12 9 Formosa 4.73 0.08 10 Jujuy 3.6 0.14 # ... with 15 more rows ``` ] --- count: false # left_join() .panel1-left_join-auto[ ```r tabla_casos %>% left_join(pob_est_2021, by = c("provincia" = "Provincia")) %>% mutate(tasa_conf = round(confirmados / pob_estimada_2021 * 100, 2), tasa_fall = round(fallecidos / pob_estimada_2021 * 100, 2)) %>% select(1, starts_with("tasa")) ``` ] .panel2-left_join-auto[ ``` # A tibble: 25 x 3 provincia tasa_conf tasa_fall <chr> <dbl> <dbl> 1 Buenos Aires 8.98 0.27 2 CABA 13.4 0.37 3 Catamarca 5.47 0.04 4 Chaco 4.72 0.13 5 Chubut 9.88 0.18 6 Córdoba 8.03 0.12 7 Corrientes 4.47 0.06 8 Entre RÃos 6.03 0.12 9 Formosa 4.73 0.08 10 Jujuy 3.6 0.14 # ... with 15 more rows ``` ] <style> .panel1-left_join-auto { color: black; width: 45.7333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-left_join-auto { color: black; width: 52.2666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-left_join-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- class: inverse, middle, center # __righ_join()__ <html> <div style='float:left'></div> <hr color='#EB811B' size=1px width=1125px> </html> --- # right_join() <img src="data:image/png;base64,#C:/Users/pablo/Documents/Pablo/Cursos R/R+UNSAM/M1_Diplo_CSS/Clase 3 - Tidyverse/img/right-join.gif" width="50%" style="display: block; margin: auto;" /> --- class: inverse, middle, center # __inner_join()__ <html> <div style='float:left'></div> <hr color='#EB811B' size=1px width=1125px> </html> --- # inner_join() <img src="data:image/png;base64,#C:/Users/pablo/Documents/Pablo/Cursos R/R+UNSAM/M1_Diplo_CSS/Clase 3 - Tidyverse/img/inner-join.gif" width="50%" style="display: block; margin: auto;" /> --- class: inverse, middle, center # __full_join()__ <html> <div style='float:left'></div> <hr color='#EB811B' size=1px width=1125px> </html> --- # full_join() <img src="data:image/png;base64,#C:/Users/pablo/Documents/Pablo/Cursos R/R+UNSAM/M1_Diplo_CSS/Clase 3 - Tidyverse/img/full-join.gif" width="50%" style="display: block; margin: auto;" /> --- class: inverse, middle, center # __anti_join()__ <html> <div style='float:left'></div> <hr color='#EB811B' size=1px width=1125px> </html> --- # anti_join() <img src="data:image/png;base64,#C:/Users/pablo/Documents/Pablo/Cursos R/R+UNSAM/M1_Diplo_CSS/Clase 3 - Tidyverse/img/anti-join.gif" width="50%" style="display: block; margin: auto;" />